In [1]:

    
# standard imports
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# using seaborn plotting
import seaborn as sns;
sns.set_style('white')
sns.set_context('paper')

sns.plt.rcParams['figure.figsize'] = (15, 12)

Convolution using Scipy

We can approximately compute the general case using Scipy's convolution and interpolation routines. First, a basic implementation to give an idea of what's going on:



In [4]:

    
simple = stats.uniform(loc=2, scale=3)
errscale = 0.25
err = stats.norm(loc=0, scale=errscale)

# cannot analytically convolve continuous PDFs in general. 
# so we now make a probability mass function on a fine grid for fft convolution
delta = 1e-4
big_grid = np.arange(-10, 10, delta)

pmf1 = simple.pdf(big_grid) * delta
pmf2 = err.pdf(big_grid) * delta

from scipy import signal
conv_pmf = signal.fftconvolve(pmf1, pmf2, 'same')



In [5]:

    
print("Grid length, sum(gauss_pmf), sum(uni_pmf), sum(conv_pmf):")
print(len(big_grid), sum(err.pdf(big_grid) * delta), sum(simple.pdf(big_grid) * delta), sum(conv_pmf))
conv_pmf = conv_pmf / sum(conv_pmf)

plt.plot(big_grid,pmf1, label='Tophat')
plt.plot(big_grid,pmf2, label='Gaussian error')
plt.plot(big_grid,conv_pmf, label='Sum')
plt.xlim(-3,max(big_grid))
plt.legend(loc='best'), plt.suptitle('PMFs')









    



Grid length, sum(gauss_pmf), sum(uni_pmf), sum(conv_pmf):
200000 1.0 1.0 1.0






    Out[5]:





(<matplotlib.legend.Legend at 0x7f57efeffb38>,
 <matplotlib.text.Text at 0x7f57ee2cb278>)

The algorithm assumes that one knows the claim count distribution, T (the probability distribution of the number of claims that will occur), and the severity of a single claim S (the distribution of the amount of a single claim).

The algorithm computes the aggregate loss distribution, $AGG = S_1 + S_2 + ... + S_T$

(the distribution of the total number of claims). The algorithm applies to arbitrary frequency and severity distributions.

Suppose a claim count distribution, a severity distribution, and the corresponding aggregate distribution are specified, In regard to the severity distribution. suppose further that the probability of any given claim being excess of a given attachment point 11 is (x. Suppose it is desired to compute the aggregate distribution l’or claims excess of ,4 (this A has nothing to do with the vector of spreads A used previously). For example, the aggregate distribution might be based on a Poisson claim count distribution with parameter 1.000 (i.e., the number of expected claims is 1,000) and a Weibull severity distribution with mean ri; 10.000 and coefficient of variation 8. If A is $100,000 then o! is 0.0 197.

Monte Carlo method

Perhaps the simplest and often most direct approach is Monte Carlo simulation. The Monte Carlo methid involves the following steps:

Choose a severity of loss and frequency of loss distribution model.
Simulate the number of losses and individual loss amounts and then calculate the corrresponding aggregate loss.
Repeat as many times (at least 5000) to obtain an empirical aggregate loss distribution.

To illustrate the application of this method, let's see the following example. It generates a aggregate loss distribution using a Poisson frequency of loss model and a Weibull severity of loss model. Enter parameter values of each distribution and the number of simulations required.

Algorithm for Simulation

For each $i$ in $1$ to number of Monte-Carlo simulation runs $K$
simulate the number of losses $N_i$
simulate NiNi many loss-sizes $X_{i,1},…,X_{i,N_i}$
calculate $L_i=\sum^{N_i}_{j=1}X_{i,j}$

Doing this you get a sample of losses $L_1,…,L_K$ and you can do all sorts of hisograms, density fits calculations.

EDIT: on this sample you could try to fit a loss distribution (e.g. Gamma or translated Gamma see here) by maximum likelihood or method of moments. But you can apply the method of moments even without MC becauase if you assume that the number of losses and the loss sizes are independent then $E[L]=E[N]* E[X]$ and $V[L]=E[N] * V[X]+E[X]^2 * V[N]$



In [64]:

    
from scipy import stats

simulations = 5000

poisson_lambda = 10
weibull_alpha = 1.79

frequency = stats.poisson(poisson_lambda).rvs(size=simulations)
severity = stats.weibull_min(weibull_alpha).rvs(size=simulations)



In [65]:

    
sns.distplot(frequency, kde=False)
sns.plt.title(("Frequency distribution: Poisson Distribution with $\lambda$ = %g" % poisson_lambda), fontsize=20)









    Out[65]:





<matplotlib.text.Text at 0x7f57e56dc080>



In [66]:

    
sns.distplot(severity, kde=False)
sns.plt.title(("Severity distribution: Weibull Distribution with $\\alpha$ = %g" % weibull_alpha), fontsize=20)









    Out[66]:





<matplotlib.text.Text at 0x7f57e558c978>



In [67]:

    
loss = []
for i in range(simulations):
    n_claims = frequency[i] 
    severity_run = stats.weibull_min(weibull_alpha).rvs(size=n_claims)
    loss.append(np.sum(severity_run))



In [68]:

    
sns.distplot(loss, kde=False)
sns.plt.title(("Aggregate Loss distribution"), fontsize=20)









    Out[68]:





<matplotlib.text.Text at 0x7f57e541c160>



In [80]:

    
!ls









    



'Bayesian Survival Analysis.ipynb'
bench_mnist.py
'Convolution for Aggregate Loss .ipynb'
'diabetes cv.ipynb'
example.db
'GaussainMixtures - Generating new data.ipynb'
'Gaussian Mixture Models.ipynb'
gaussian_mixture.py
'Hierarchical Models - Bayesian.ipynb'
'In Depth - Kernel Density Estimation.ipynb'
insurance.csv
'Machine Learning Intro Notebook.ipynb'
mkl_benchmark.py
'Python Tutorial - Insurance Dataset.ipynb'
radon.csv
'RANSAC - fit model with outliers.ipynb'
sms.tsv
svm_model_persist.py
svm_oneclass.py
xgb_cv.py



In [71]:









    Out[71]:





<module 'distutils.version' from '/home/s4szqx/anaconda3/lib/python3.6/distutils/version.py'>



In [ ]: